Internationalized Back-of-the-Book Indexes for XSL Formatting Objects

نویسندگان

  • W. Eliot Kimber
  • Joshua Reynolds
چکیده

The XSL recommendation and its commercial implementations have finally reached a point of maturity that allows for their use in generating production-quality printed technical documents. However, the XSL specification does not provide any built-in features specific to the production of back-of-the-book indexes. In addition, the current commercial tools do not provide any proprietary features for indexing comparable to those that can be found in existing page composition systems such as Arbortext's Epic Publisher and XyEnterprise XPP. Thus, if XSL-FO is to be used to produce documents that have back-of-the-book indexes, the style sheet author must implement all the processing necessary to create the index. This task is challenging enough for Latin languages. For indexes in other languages, the task is further complicated by the need to support locale-specific collation schemes and index groupings. Defining collation and grouping rules for languages such as Traditional and Simplified Chines, that use tens of thousands of unique characters, further complicates the task. This paper describes in detail the system developed by the authors to satisfy the challenge of producing back-of-the-book indexes using XSL-FO for a number of Asian and Middle-Eastern languages, including Thai, Japanese, Korean, Arabic, Hebrew, and Chinese (Traditional and Simplified). The solution developed includes the following key components: • XSL business logic for processing the index entries in order to generate the index pages themselves • Java libraries that support index item grouping and sorting for arbitrary national languages and locales, configured through an XML document by which the index groups and sorting rules are easily specified • a PDF post processing step that removes duplicate page numbers from the index pages produced by the XSL-FO process. The solution developed is completely generic and can be easily adapted to any typical technical documentation document type that uses embedded index entry markup (e.g., any Docbook-like document type). The configuration of the indexes is through relatively simple XML documents that define the index groups, sorting and grouping strategies to use and, if necessary, locale-specific collation orders for index entry and group sorting. The paper describes how the system takes Rendered by www.RenderX.com full advantage of Java's built-in internationalization support to simplify both the initial development task and the definition of custom index configurations. The paper specifically discusses the indexing challenges posed by the national languages involved and how the XSL and Java code developed meets those challenges. Also discusses the general issues involved in generating indexes with XSL-FO, including issues of locale-specific sorting, grouping using the Munchian Method, and use of Java's built-in support for locale-specific collation. Internationalized Back-of-the-Book Indexes for XSL Formatting Objects Table of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using XSL Formatting Objects for Production-Quality Internationalized Document Printing

The XSL Formatting Objects (XSL-FO) specification was designed from the start to be locale and language neutral. This makes XSL-FO well suited to the task of composing for print internationalized documents, and in particular, documents in non-Western languages. However, users of XSL-FO are dependent on both the implementation of XSL-FO internationalization features, such as support for differen...

متن کامل

Introducing LaTeX users to XSL-FO

This talk aims to introduce LATEX users to XSL-FO. It does not attempt to give an exhaustive view of XSL-FO, but allows a LATEX user to get started. We show the common and different points between these two approaches of word processing.

متن کامل

Multidirectional Typesetting in xsl-fo∗

xsl-fo texts use an xml-like syntax that aim to describe high-quality print outputs. This article complements the introduction to xsl-fo EuroBachoTEX 2007. We show how xsl-fo allows users to typeset texts belonging to different writing systems: from left to right, from right to left, . . . We compare this implementation to TEX-like typeset engines, e.g., X E TEX.

متن کامل

Publishing Workflows with XSL-FO

The abstract was not available at the time the proceedings were created. Please check an updated version [http://www.idealliance.org/papers/xml02/dx_xml02/html/abstract/.html] of the paper abstracts at the conference proceedings web site.

متن کامل

System Architecture for XML Offload to a Cell Processor-Based Workstation

This paper describes the design, prototype implementation, and evaluation of a system architecture for XML offload to a Cell processor-based workstation. This architecture includes a high-performance parser based on a novel enhanced finite state machine technology. RenderX 1 XML 2005 Conference proceeding by RenderX author of XML to PDF (XSL FO) formatter. XSL• FO formatter Re-format page sizes

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002